running time
Linear Time Approximation Algorithm for Column Subset Selection with Local Search
The Column Subset Selection (CSS) problem has been widely studied in dimensionality reduction and feature selection. The goal of the CSS problem is to output a submatrix S, consisting of k columns from an n d input matrix A that minimizes the residual error A-SS^\dagger A _F^2, where S^\dagger is the Moore-Penrose inverse matrix of S. Many previous approximation algorithms have non-linear running times in both n and d, while the existing linear-time algorithms have a relatively larger approximation ratios. Additionally, the local search algorithms in existing results for solving the CSS problem are heuristic. To achieve linear running time while maintaining better approximation using a local search strategy, we propose a local search-based approximation algorithm for the CSS problem with exactly k columns selected.
5421e013565f7f1afa0cfe8ad87a99ab-AuthorFeedback.pdf
For3 now, we report total running times on the cross-validated computational graphs, for a diverse selection of datasets.4 We will augment this description with a detailed description in the12 supplementary. Missing values: We selected k-NN imputation because it arguably provides a stronger baseline than simple mean19 imputation (while being computationally more demanding). However, using EM as an inner loop within a structure search would be computationally quite21 demanding. Determining the computational graph isfarsimpler,and can be tackled with cross-validation30 (asinthispaper), orassuggested bythereviewer using AutoML techniques orneural structural search (NAS).
A Missing Details and Proofs We denote the degree of vertex v
We stress that unweighted and weighted in the linkage measure names refer to the linkage methods. Recall that our approach is based on geometric layering, where we group the edges based on their weights and process all edges within the same layer in parallel. A similar idea is used in the Affinity Clustering algorithm of Bateni et al. [ Our algorithm starts by first randomly coloring the active vertices red and blue with equal probability. Directly applying the random-mate approach (e.g., as applied in Let D be initialized to the identity clustering. O (log n) layers are required to represent every weight in this weight range.Lemma 2.1.